Transmitting Apparatus and Method Thereof for Video Processing

ABSTRACT

The present invention relates to a method and a transmitting apparatus for encoding a bitstream representing a sequence of pictures of a video stream comprising a processor and memory, said memory containing instructions executable by said processor whereby said transmitting apparatus is operative to: send a syntax element, wherein a value of the syntax element is indicative of restrictions that are enforced on the bitstream in a way that guarantees a certain level of parallelism for decoding the bitstream.

RELATED APPLICATIONS

This application is a continuation of U.S. application Ser. No. 14/348,626, filed Mar. 31, 2014, which is a U.S. National Phase Application of PCT/SE2013/050803, filed Jun. 27, 2013, which claims benefit of U.S. Provisional Application No. 61/666,056, filed Jun. 29, 2012. The entire contents of each of the aforementioned applications are incorporated herein by reference.

TECHNICAL FIELD

The embodiments relate to a method and a transmitting apparatus for improving coding performance when parallel encoding/decoding is possible.

BACKGROUND

High Efficiency Video Coding (HEVC) is a video coding standard being developed in Joint Collaborative Team-Video Coding (JCT-VC). JCT-VC is a collaborative project between Moving Picture Experts Group (MPEG) and International Telecommunication Union-Telecommunication Standardization Sector (ITU-T). Currently, an HEVC Model (HM) is defined that includes a number of tools and is considerably more efficient than H.264/Advanced Video Coding (AVC).

HEVC is a block based hybrid video coded that uses both inter prediction (prediction from previous coded pictures) and intra prediction (prediction from previous coded pixels in the same picture). Each picture is divided into quadratic treeblocks (corresponding to macroblocks in H.264/AVC) that can be of size 16×16, 32×32 or 64×64 pixels. A variable CtbSize is used to denote the size of treeblocks expressed as number of pixels of the treeblocks in one dimension i.e. 16, 32 or 64.

Regular slices are similar as in H.264/AVC. Each regular slice is encapsulated in its own Network Abstraction Layer (NAL) unit, and in-picture prediction (intra sample prediction, motion information prediction, coding mode prediction) and entropy coding dependency across slice boundaries are disabled. Thus a regular slice can be reconstructed independently from other regular slices within the same picture. Since the treeblock, which is a basic unit in HEVC, can be of a relatively big size e.g., 64×64, a concept of “fine granularity slices” is included in HEVC to allow for Maximum Transmission Unit MTU size matching through slice boundaries within a treeblock, as a special form of regular slices. The slice granularity is signaled in a picture parameter set, whereas the address of a fine granularity slice is still signaled in a slice header.

The regular slice is the only tool that can be used for parallelization in H.264/AVC. Parallelization implies that parts of a single picture can be encoded and decoded in parallel as illustrated in FIG. 1 where threaded decoding can be used using slices. Regular slices based parallelization does not require much inter-processor or inter-core communication. However, for the same reason, regular slices can require some coding overhead due to the bit cost of the slice header and due to the lack of prediction across the slice border. Further, regular slices (in contrast to some of the other tools mentioned below) also serve as the key mechanism for bitstream partitioning to match MTU size requirements, due to the in-picture independence of regular slices and that each regular slice is encapsulated in its own NAL unit. In many cases, the goal of parallelization and the goal of MTU size matching place contradicting demands to the slice layout in a picture. The realization of this situation led to the development of the parallelization tools mentioned below.

In wavefront parallel processing (WPP), the picture is partitioned into single rows of treeblocks. Entropy decoding and prediction are allowed to use data from treeblocks in other partitions. Parallel processing is possible through parallel decoding of rows of treeblocks, where the start of the decoding of a row is delayed by two treeblocks, so to ensure that data related to a treeblock above and to the right of the subject treeblock is available before the subject treeblock is being decoded. Using this staggered start (which appears like a wavefront when represented graphically as illustrated in FIG. 2), parallelization is possible with up to as many processors/cores as the picture contains treeblock rows. Due to the permissiveness of in-picture prediction between neighboring treeblock rows within a picture, the required inter-processor/inter-core communication to enable in-picture prediction can be substantial. The WPP partitioning does not result in the production of additional NAL units compared to when it is not applied, thus WPP cannot be used for MTU size matching. A wavefront segment contains exactly one line of treeblocks.

Tiles define horizontal and vertical boundaries that partition a picture into tile columns and rows. That implies that the tiles in HEVC divide a picture into areas with a defined width and height as illustrated in FIG. 3. Each area of the tiles consists of an integer number of treeblocks that are processed in raster scan order. The tiles themselves are processed in raster scan order throughout the picture. The exact tile configuration or tile information (number of tiles, width and height of each tile etc) can be signaled in a sequence parameter set (SPS) and in a picture parameter set (PPS). The tile information contains the width, height and position of each tile in a picture. This means that if the coordinates of a block is known, it is also known what tile the block belongs to.

For simplicity, restrictions on the application of the different picture partitioning schemes are specified in HEVC. Tiles and WPP may not be applied at the same time. Furthermore, for each slice and tile, either or both of the following conditions must be fulfilled: 1) all coded treeblocks in a slice belong to the same tile; 2) all coded treeblocks in a tile belong to the same slice.

The Sequence Parameter Set (SPS) holds information that is valid for an entire coded video sequence. Specifically it holds the syntax elements profile_idc and level_idc that are used to indicate which profile and level a bitstream conforms to. Profiles and levels specify restrictions on bitstreams and hence limits on the capabilities needed to decode the bitstreams. Profiles and levels may also be used to indicate interoperability points between individual decoder implementations. The level enforces restrictions on the bitstream for example on the Picture size (denoted MaxLumaFS expressed in luma samples) and sample rate (denoted MaxLumaPR expressed in luma samples per second) as well as max bit rate (denoted MaxBR expressed in bits per second) and max coded picture buffer size (denoted Max CPB size expressed in bits).

The Picture Parameter Set (PPS) holds information that is valid for some (or all) pictures in a coded video sequence. The PPS comprises syntax elements that control the usage of wavefronts and tiles and it is required to have same value in all PPSs that are active in the same coded video sequence.

Moreover, both HEVC and H.264 define a video usability information (VUI) syntax structure, that can be present in a sequence parameter set and contains parameters that do not affect the decoding process, i.e. do not affect the pixel values. Supplemental Enhancement Information (SEI) is another structure that can be present in any access unit and that contains information that does not affect the decoding process.

Hence, as mentioned above, compared to H.264/AVC, HEVC provides better possibilities for parallelization. Parallelization implies that parts of a single picture can be encoded and decoded in parallel. Specifically tiles and WPP are tools developed for parallelization purposes. Both were originally designed for encoder parallelization but they may also be used for decoder parallelization.

When tiles are being used for encoder parallelism, the encoder first chooses a tile partitioning. Since tile boundaries break all predictions between the tiles, the encoder can assign the encoding of multiple tiles to multiple threads. As soon as there are at least two tiles, multiple thread encoding can be done.

Accordingly, in this context, the fact that a number of threads can be used, implies that the actual workload of the encoding/decoding process can be divided into separate “processes” that are performed independently of each other, i.e. they can be performed in parallel in separate threads.

HEVC defines two types of entry points for parallel decoding. Entry points can be used by a decoder to find the position in the bitstream where the bits for a tile or substream starts. The first type is entry points offsets. Those are listed in the slice header and indicates starting points of one or more tiles that are contained in the slice. The second type is entry point markers which separates tiles in the bitstream. An entry point marker is a specific codeword (start code) which cannot occur anywhere else in the bitstream.

Thus for decoder parallelism to work, there needs to be entry points in the bitstream. For parallel encoding, there does not need to be entry points, the encoder can just stitch the bitstream together after the encoding of the tiles/substreams are complete. However, the decoder needs to know where each tile starts in the bitstream in order to decode in parallel. If an encoder only wants to encode in parallel but does not want to enable parallel decoding, it could omit the entry points, but if it also wants to enable decoding in parallel it must insert entry points.

SUMMARY

The object of the embodiments of the present invention is to improve the performance when parallel encoding/decoding is available. That is achieved by sending a syntax element, herein a value of the syntax element is indicative of restrictions that are enforced on the bitstream in a way that guarantees a certain level of parallelism for decoding the bitstream. A decoder, receiving an encoded bitstream that is encoded according to the indication from the syntax element, can use this indication when deciding how it could decode the encoded bitstream and if the bitstream can be encoded.

According to a first aspect of the embodiments a method for encoding a bitstream representing a sequence of pictures of a video stream is provided. In the method, a syntax element is sent, wherein a value of the syntax element indicative of restrictions that are enforced on the bitstream in a way that guarantees a certain level of parallelism for decoding the bitstream.

According to a second aspect a transmitting apparatus for encoding a bitstream representing a sequence of pictures of a video stream is provided. The transmitting apparatus comprises a processor and memory. Said memory contains instructions executable by said processor whereby said transmitting apparatus is operative to send a syntax element wherein a value of the syntax element is indicative of restrictions that are enforced an the bitstream in a way that guarantees a certain level of parallelism for decoding the bitstream.

An advantage with at least some embodiments, is that they provide means of indicating that a bitstream can be decoded in parallel. This enables a decoder, that is capable of decoding in parallel, to find out before starting decoding whether it will be possible for the decoder to decode the stream or not. There may for example be a decoder that can decode 720p on a single thread but decode 1080p given that each picture is split into at least 4 independently decodeable regions (i.e. it can be decided in 4 parallel threads). By using the embodiments, the decoder will know whether each picture is split or not, i.e. whether it can be decoded in parallel or not.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 illustrates an example of threaded decoding using slices according to prior art.

FIG. 2 illustrates an example of threaded decoding using wavefronts according to prior art.

FIG. 3 illustrates an example of threaded decoding using tiles according to prior art.

FIG. 4 illustrates schematically a transmitting apparatus configured to transmit the syntax element to a receiving apparatus according to an embodiment of the present invention.

FIG. 5 exemplifies parallelism levels according to prior art.

FIGS. 6, 7 a and 7 b illustrate flowcharts of the method to the present invention.

FIG. 8 illustrates schematically an implementation of the transmitting apparatus according to embodiments of the present invention.

DETAILED DESCRIPTION

In this specification, the term processing unit is used to refer to the unit that encodes or decodes a coded video sequence. In practice that might for example correspond to a CPU, a GPU, a DSP, an FPGA, or any type of specialized chip. It might also correspond to a unit that contains multiple CPUs, GPUs etc or combinations of those. The term core is used to refer to one part of the processing unit that is capable of performing (parts of) the encoding/decoding process in parallel with other cores of the same processor. In practice that might for example correspond to a logical core in a CPU, a physical core in a multi-core CPU, one CPU in a multi-CPU architecture or a single chip in a processing board with multiple chips.

The term thread is used to denote some processing steps that can be performed in parallel with some other processing steps. Multiple threads can thus be executed in parallel. When the number of cores in a processor is greater than or equal to the number of threads that are possible to execute, all those threads can be executed in parallel. Otherwise, some threads will start after others have finished or alternatively time-division multiplexing can be applied. One particular thread cannot be executed on multiple cores at the same time. When the decoding process at any point is divided into different parts (steps or actions) that are independent of each other and therefore can be performed in parallel we say that each such part constitutes a thread.

It is said that a bitstream supports a certain level of parallelism if it is created in such a way that it is possible to decode certain parts of it in parallel. In this text the focus is on parallelism within pictures but the embodiments are not limit to that. It could be applied to any type of parallelism within video sequences.

A decoding process, for a picture that consists of three slices A. B and C, defines how to decode the picture in sequential order:

-   Decode A. -   Decode B. -   Decode C.

Perform deblocking inside and between treeblocks.

However, if a processor P has 4 cores (called i, ii, iii and iv) the decoding can be performed as follows

-   1. Core i decodes slice A, core ii decodes slice B, core iii decodes     slice C in parallel. -   2. When all slices are decoded, deblocking inside and between     treeblocks is performed by any core (since all four cores will be     free).

This is possible due to that the slices are independently decodable.

It might be the case that each core in processor P in the example above is capable of processing a certain number of luma samples. When the decoder that uses P for decoding reports its capabilities e.g. in the form of a level, it would be forced to report the capabilities on the single core decoding performance as it would have to be prepared for bitstream that are not constructed for parallel decoding e.g. only contains a single slice.

A decoder running that could decode a higher level given that the bitstream supports parallel decoding can be forced to restrict its conformance claims to a lower level. That can be avoided by conveying a parallel decoding property of the stream by using a syntax element according to embodiments of the present invention.

Therefore, a method for encoding a bitstream representing a sequence of pictures of a video stream is provided. In the method, a syntax element is sent, e.g. in the SPS, wherein a value of the syntax element is indicative of restrictions that are enforced on the bitstream in a way that guarantees a certain level of parallelism for decoding the bitstream. That implies that the value of the syntax element can be used for determining the level of parallelism that the bitstream is encoded with. As illustrated in FIG. 4, the syntax element 440 is sent in a SPS 430 from a transmitting apparatus 400 to a receiving apparatus 450 via respective in/out-put units 405,455. The transmitting apparatus 400 comprises an encoder 410 for encoding the bitstream with a level of parallelism indicated by the syntax element 440. A syntax element managing unit 420 determines the level of parallelism that should be used. The level of parallelism can be determined based on information 470 of the receiving apparatus' 450 decoder 460 capabilities relating to parallel decoding.

The syntax element is also referred to as parallelism_idc which also could be denoted minSpatialSegmentation_idc. The syntax element is valid for a sequence of pictures of a video stream which can be a set of pictures or an entire video sequence.

According to one embodiment, the value of the syntax element is set equal to the level of parallelism, wherein the level of parallelism indicates the number of threads that can be used. Referring to FIG. 5, where one picture is divided into four independent parts of equally spatial size that can be decoded in parallel, the level of parallelism is four and the other picture is divided into two equally sized independent parts that can be decoded in parallel, the level of parallelism is two.

The value of the syntax element is determined as illustrated in the flowchart of FIG. 6, according to this embodiment. One way of determining the value of the syntax element is to use information from the decoder about the decoder capabilities relating to parallel decoding.

Another way is that the encoder chooses the level of parallelism purely for encoder purposes and provides the parallel information to decoders since that may be useful for some decoders.

Another way is to assume a certain decoder design for the decoders that may decode the stream. If you know that 80% of all smartphones have 4 cores, the encoder can always use 4 independent parts.

When the value of the syntax element is determined 601, one restriction referred to as e.g. a, b, and c is imposed 603 by performing the steps as specified below to fulfill 602 the requirements of one of the restrictions as specified below. Then the syntax element is sent 604 to the decoder of the receiver and the picture is encoded according to the value of the syntax element.

According to one embodiment, the value of the syntax element is used to impose one restriction of at least a and b on the bitstream:

a: A maximum number of luma samples per slice is restricted as a function of the picture size and as a function of the value of the syntax element.

b: A maximum number of luma samples per tile is restricted as a function of the picture size and as a function of the value of the syntax element.

According to a further embodiment, the value of the syntax element is used to impose one restriction of a, b and c on the bitstream:

a: A maximum number of luma samples per slice is restricted as a function of the picture size and as a function of the value of the syntax element.

b: A maximum number of luma samples per tile is restricted as a function of the picture size and as a function of the value of the syntax element.

c: Wavefronts are used and treeblock size, picture height and picture width are restricted jointly as a function of the value of the syntax element and as a function of the picture size.

According to an embodiment, the requirements of restrictions a,b or a,b,c are fulfilled 602 by the following steps as illustrated in FIG. 7 a.

Hence, an encoder may be configured to perform the following steps in order to fulfill requirement a.

Multiple pictures constituting a video sequence are encoded and the following is performed:

The treeblocks of the pictures are divided 602 a into slices (such that each division consist of consecutive treeblocks in raster scan order) such that each division does not contain more luma samples than what is allowed by restriction a.

The treeblocks are encoded in their respective slices (in which the encoding of the different slices are performed sequentially or in parallel).

An encoder may be configured to perform the following steps in order to fulfill restriction b.

Multiple pictures constituting a video sequence are encoded and the following is performed:

The pictures are divided 602 b into tiles (with horizontal and vertical lines at sample positions that are integer multiples of CtbSize (CtbSize is equal to treeblock size)) such that each tile does not contain more luma samples than what is allowed by restriction b.

The treeblocks are encoded in their respective tiles (in which the encoding of the different tiles are performed sequentially or in parallel).

An encoder may be configured to perform the following steps in order to fulfill restriction c.

Multiple pictures constituting a video sequence are encoded 602 c with wavefronts (setting tiles_or_entropy_coding_sync_idc to 2 in each PPS) and once for the entire sequence the following is performed:

Given the resolution (the height and the width in luma samples) of the (pictures in the) video sequence, the CtbSize is selected such that restriction c is fulfilled.

There are several different reasons why an encoder needs to fulfill one (or more) of the requirements for a certain value of the syntax element. The value of the syntax element is also referred to as value X. One such reason could be that the encoder is encoding a video sequence for a decoder that has indicated that it needs a specific value X or for a decoder that needs the value X to be in a specific set of values (such as higher than or equal to a certain value).

If there are no specific requirement for the value X, an encoder might want to set the value X to as high number as possible in order to aid decoders and in order to produce a bitstream that as many decoders as possible will be able to decode. As an example, consider the case where an encoder splits each picture into 4 equal parts which means that X=4.0. It is ok for the encoder to send X=2,0 and to indicate that the pictures are split into at least 2 parts, but it could be advantageous for the decoder to send a value X with the highest possible value, i.e. 4.0 in this example.

An encoder may be configured to perform the following steps in order to indicate the value X according to restriction a.

1. Multiple pictures constituting a video sequence are encoded using one or more slice per picture.

2. The value X is signaled as the highest possible value for which every slice in the sequence fulfills restriction a.

An encoder may be configured to perform the following steps in order to indicate the value X according to restriction b.

1. Multiple pictures constituting a video sequence are encoded using a tile configuration that is suitable for the encoder.

2. The value X is signaled as the highest possible value for which every tile in the sequence fulfills restriction b.

An encoder may be configured to perform the following steps in order to indicate the value of X according to restriction c.

1. Multiple pictures constituting a video sequence with height equal to pic_height_in_luma_samples and width equal to pic_width_in_luma_samples and treeblock size equal to CtbSize are encoded using WPP.

2. The value X is signaled as the highest possible value for which restriction c is fulfilled.

According to a further embodiment, the syntax element denoted parallelism_idc indicates the restriction that are enforced on the bitstream by that the value X also referred to as parallelism is equal to (parallelism_idc/4)+1. Similar to the embodiments described above, the level of parallelism indicates the number of threads that can be used. Referring to FIG. 5, where this is exemplified by a picture which is divided into four independent parts that can be decoded in parallel, wherein the level of parallelism is four and by a picture which is divided into two independent parts that can be decoded in parallel, wherein the level of parallelism is two.

Thus the value X (i.e. the value of the syntax element) can be used for determining the level of parallelism that the bitstream is encoded with by calculating value X=parallelism=(parallelism_idc/4)+1=(syntax element/4)+1.

This embodiment presents a more specific version of the previous embodiment and all different aspects from the previous embodiments are not repeated in this embodiment, however, they can be combined and/or restricted in any suitable fashion.

According to this embodiment, the level of parallelism is calculated as: Parallelism=(parallelism_idc/m)+1

When Parallelism is greater than 1 the constraints specified below applies. The preferred value of m is 4 but other values can alternatively be used.

When Parallelism is greater than 1 701, one of the following conditions must be fulfilled 702 as illustrated in the flowchart of FIG. 7 b:

A. tiles_or_entropy_coding_sync_idc is equal to 0 in each picture parameter set activated within the coded video sequence and the maximum number of luma samples in a slice is less than or equal to floor(pic_width_in_luma_samples*pic_height_in_luma_samples/parallelism).

B. tiles_or_entropy_coding_sync_idc is equal to 1 in each picture parameter set activated within the coded video sequence and the maximum number of luma samples in a tile is less than or equal to floor(pic_width_in_luma_samples*pic_height_in_luma_samples/parallelism).

C. tiles_or_entropy_coding_sync_idc is equal to 2 in each picture parameter set activated within the coded video sequence and the syntax elements pic_width_in_luma_samples, pic_height_in_luma_samples and the variable CtbSize are restricted such that: (2*(pic_height_in_luma_samples/CtbSize)+(pic_width_in_luma_samples/CtbSize)) *CtbSize*CtbSize≤floor(MaxLumaFS/parallelism)

It should be noted that floor (x) implies the largest integer less than or equal to x. pic_width_in_luma_samples is a picture width and pic_height_in_luma_samples is a picture height.

The value X, denoted parallelism, can be used by a decoder to calculate the maximum number of luma samples to be processed by one thread, making the assumption that the decoder maximally utilizes the parallel decoding information. It should be noted that there might be inter-dependencies between the different threads e.g. deblocking across tile and slice boundaries or entropy synchronization. To aid decoders in planning the decoding workload distribution it is recommended that encoders set the value of parallelism_idc to the highest possible value for which one of the three conditions above is fulfilled. For the case when tiles_or_entropy_coding_sync_idc=2 that means setting parallelism_idc=floor(4*MaxLumaFS/(2*((pic_height_in_luma_samples/CtbSize)+(pic_width_in_luma_samples/CtbSize))*CtbSize*CtbSize))−4

The condition A in this embodiment is a special case of restriction a and the encoder steps that are presented for restriction a applies for condition A.

The condition B in this embodiment is a special case of restriction b and the encoder steps that are presented for restriction b applies for condition B.

The condition C in this embodiment is a special case of restriction c and the encoder steps that are presented for restriction c applies for condition C.

Specifically, an encoder may be configured to perform the following steps in order to indicate the value of X according to condition C.

Multiple pictures constituting a video sequence with height equal to pic_height_in_luma_samples and width equal to pic_width_in_luma_samples and treeblock size equal to CtbSize are encoded using WPP.

The value X is signaled, by means of the syntax element, as the highest possible value for which restriction c is fullfilled, i. e. value X=floor(MaxLumaFS/(2*((pic_height_in_luma_samples/CtbSize)+(pic_width_in_luma_samples/CtbSize))*CtbSize*CtbSize))

According to a further embodiment, the value of the syntax element is used to impose a restriction of the number of bytes per slice, tile or wavefront.

Accordingly, there may be a restriction on the maximum number of bytes (or bits) in one slice, tile and/or wavefront-substream as a function of the value X and optionally as a function of MaxBR and/or Max CPB size. MaxBR is the maximum bitrate for a level and CPB size is a Coded Picture Buffer size.

The restriction could be combined with one or more of the restrictions above.

Specifically, an encoder may be configured to perform the following steps in order to fulfill restriction a and a requirement on the maximum number of bytes in a slice.

1. Multiple pictures constituting a video sequence are encoded and the following is performed:

2. The treeblocks of a slice are encoded right up until before the start of the first treeblock T that fulfills one or more of the following conditions:

i. if it would have been encoded in the same slice it would have resulted in that the slice would have contained more luma samples than allowed by restriction a.

ii. if it would have been encoded in the same slice it would have resulted in that the number of bytes in the slice would be more than what is allowed by the requirement on the maximum number of bytes in a slice.

3. Instead of including that specific treeblock T in the slice that is currently coded the encoder completes the slice before T and begins a new slice with T as the first treeblock. The process continues from step two until the entire picture has been encoded.

According to further aspects, a transmitting apparatus is provided. The transmitting apparatus comprises an encoder for encoding the bitstream. The encoder comprises a processor and in/out-put section configured to perform the method steps as described above. Thus the encoder is configured to encode the bitstream e.g. with a level of parallelism according to the syntax element, i.e. to use the value of the syntax element to impose a restriction according to the embodiments. The in/out-put section is configured to send the syntax element. Moreover, the encoder can be implemented by a computer wherein the processor of the encoder is configured to execute software code portions stored in a memory, wherein the software code portions when executed by the processor generates the respective encoder methods above.

Accordingly, a transmitting apparatus 400 for encoding a bitstream representing a sequence of pictures of a video stream is provided as illustrated in FIG. 8. The transmitting apparatus 400 comprises as described above an encoder 410 and a syntax element managing unit 440. According to one implementation the encoder and the syntax element managing unit is implemented by a computer 800 comprising a processor 810, also referred to as a processing unit and a memory 820. Thus, the transmitting apparatus 400 according to this aspect comprises a processor 810 and memory 820. Said memory 820 contains instructions executable by said processor 810 whereby said transmitting apparatus 400 is operative to send a syntax element, wherein a value of the syntax element is indicative of restrictions that are enforced on the bitstream in a way that guarantees a certain level of parallelism for decoding the bitstream.

The transmitting apparatus 400 is operative to set the value of the syntax element equal to the level of parallelism. That may be achieved by a syntax element managing unit 440 e.g. implemented by the processor.

Further, the transmitting apparatus 400 may be operative to use the value of the syntax element to impose one restriction of a, b and c on the bitstream by e.g. using the syntax managing unit:

a: A maximum number of luma samples per slice is restricted as a function of the picture size and as a function of the value of the syntax element.

b: A maximum number of luma samples per tile is restricted as a function of the picture size and as a function of the value of the syntax element.

c: Wavefronts are used and treeblock size and picture height and picture width are restricted jointly as a function of the value of the syntax element and as a function of the picture size.

In addition, according to different embodiments, the transmitting apparatus 400 is operative to: perform different actions relating to the division inte treeblock sizes or selection of treeblock size in order to fulfill requirement of the restrictions referred to as a,b and c as explained previously.

According to a further embodiment, the transmitting apparatus, by using the in/out-put unit, is operative to signal the value of the syntax element as a highest possible value for which every slice in the sequence fulfills requirement of the restriction a.

According to a further embodiment, the transmitting apparatus, by using the in/out-put unit, is operative to signal the value of the syntax element as a highest possible value, for which every tile in the sequence fulfills requirement of the restriction b.

According to a further embodiment, the transmitting apparatus, by using the in/out-put unit, is operative to signal the value of the value of the syntax element as the highest possible value for which requirement of the restriction c is fulfilled.

According to a further embodiment, the transmitting apparatus, e.g. by using the syntax element managing unit, is operative to determine the level of the parallelism as a function of the value of the syntax element, which could be exemplified by Parallelism=((value of syntax element)/4)+1.

Hence the transmitting apparatus may be configured to: when the value of the syntax element indicates restrictions that are enforced on the bitstream such that at least two parallel processes can be used for decoding, i.e. parallelism>1, ensure that one of the following conditions must be fullfilled:

Condition a: If neither tiles nor wavefronts are used within the sequence to be encoded, the maximum number of luma samples in a slice should be less than or equal to floor(picture width*picture height/Parallelism)

Condition b: If tiles but not wavefronts are used within the sequence to be encoded, the maximum number of luma samples in a tile should be less than or equal to floor(picture width*picture height/Parallelism)

Condition c: If wavefronts but not tiles are used within the sequence to be encoded, the syntax elements indicating picture width, picture height the variable indicating treeblock size are restricted such that: (2*(picture height/treeblock size)+(picture width/treeblock size))*treeblock size*treeblock size≤floor(maximum frame size/Parallelism), wherein floor (x) implies the largest integer less than or equal to x.

The in/out-put unit of the transmitting apparatus may be configured to send the syntax element in a sequence parameter set.

As a yet further alternative, the transmitting apparatus is operative to use the value of the syntax element to impose a restriction of the number of bytes or bits per slice, tile or wavefront e.g. by using the syntax element managing unit.

The encoder of the transmitting apparatus and the decoder of the receiving apparatus, respectively, can be implemented in devices such as video cameras, displays, tablets, digital TV receivers, network nodes etc. Accordingly, the embodiments apply to a transmitting apparatus and any element that operates on a bitstream (such as a network-node or a Media Aware Network Element). The transmitting apparatus may for example be located in a user device e.g. a video camera in e.g. a mobile device.

The embodiments are not limited to HEVC but may be applied to any extension of HEVC such as a scalable extension or multiview extension or to a different video codec. 

What is claimed is:
 1. A method of encoding a bitstream representing a sequence of pictures of a video stream, the method comprising: determining an encoding constraint corresponding to a known or assumed per-core processing capability of a multi-core decoder, the encoding constraint being a spatial segmentation constraint that constrains a spatial size needed to be handled by a single core of the multi-core decoder during parallel, multi-core decoding at the multi-core decoder; encoding the bitstream subject to the constraint and thereby obtaining an encoded bitstream wherein a processing capability needed for decoding separately decodable portions of the encoded bitstream does not exceed the known or assumed per-core processing capability; generating a Sequence Parameter Set (SPS) for use in decoding the encoded bitstream, and including a syntax element in the SPS that indicates the encoding constraint; and transmitting the encoded bitstream for decoding, including transmitting the SPS, for use by a multi-core decoder in determining whether its individual cores have sufficient processing capability for decoding the separately decodable portions of the encoded bitstream.
 2. The method of claim 1, wherein the constraint is a limit on the number of luma samples per picture slice, said limit being dependent on a picture size associated with the sequence of pictures.
 3. The method of claim 1, wherein encoding the bitstream subject to the constraint comprises dividing treeblocks of the pictures in the bitstream into slices such that each division consists of consecutive treeblocks in raster scan order and such that each division does not contain more luma samples than a maximum number representing said constraint.
 4. The method of claim 1, wherein the constraint is slice based, tile based, or wavefront based, and wherein the constraint defines a maximum number of bytes or bits per slice, per tile, or per wavefront
 5. The method of claim 1, wherein the separately decodable portions of the encoded bitstream correspond to decoding threads and wherein the encoding constraint ensures that the decoding threads can be performed in parallel for a multi-core decoder having available processing cores that provide or exceed the known or assumed per-core processing capability.
 6. The method of claim 1, wherein encoding the bitstream subject to the constraint comprises encoding the bitstream to obtain separately decodable portions equal in number to a known or assumed number of processing cores at the multi-core decoder, such that any multi-core decoder having that number of available individual processing cores can decode the separately decodable portions in parallel, assuming that each available individual processing core has at least the known or assume processing capability.
 7. The method of claim 1, wherein the method includes receiving a report from a decoder that indicates decoding capabilities of the decoder and wherein the known or assumed per-core processing capability is determined from the report.
 8. An encoding apparatus configured to encode a bitstream representing a sequence of pictures of a video stream, the encoding apparatus comprising: processing circuitry configured to: determine an encoding constraint corresponding to a known or assumed per-core processing capability of a multi-core decoder, the encoding constraint being a spatial segmentation constraint that constrains a spatial size needed to be handled by a single core of the multi-core decoder during parallel, multi-core decoding at the multi-core decoder; encode the bitstream subject to the constraint and thereby obtaining an encoded bitstream wherein a processing capability needed for decoding separately decodable portions of the encoded bitstream does not exceed the known or assumed per-core processing capability; generate a Sequence Parameter Set (SPS) for use in decoding the encoded bitstream, and including a syntax element in the SPS that indicates the encoding constraint; and input/output circuitry configured to transmit the encoded bitstream for decoding, including transmitting the SPS, for use by a multi-core decoder in determining whether its individual cores have sufficient processing capability for decoding the separately decodable portions of the encoded bitstream.
 9. The encoding apparatus of claim 8, wherein the constraint is a limit on the number of luma samples per picture slice, said limit being dependent on a picture size associated with the sequence of pictures.
 10. The encoding apparatus of claim 8, wherein the processing circuitry is configured to encode the bitstream subject to the constraint by dividing treeblocks of the pictures in the bitstream into slices such that each division consists of consecutive treeblocks in raster scan order and such that each division does not contain more luma samples than a maximum number representing said constraint.
 11. The encoding apparatus of claim 8, wherein the constraint is slice based, tile based, or wavefront based, and wherein the constraint defines a maximum number of bytes or bits per slice, per tile, or per wavefront.
 12. The encoding apparatus of claim 8, wherein the separately decodable portions of the encoded bitstream correspond to decoding threads and wherein the encoding constraint ensures that the decoding threads can be performed in parallel for a multi-core decoder having available processing cores that provide or exceed the known or assumed per-core processing capability.
 13. The encoding apparatus of claim 8, wherein the processing circuitry is configured to encode the bitstream subject to the constraint by encoding the bitstream to obtain separately decodable portions equal in number to a known or assumed number of processing cores at the multi-core decoder, such that any multi-core decoder having that number of available individual processing cores can decode the separately decodable portions in parallel, assuming that each available individual processing core has at least the known or assume processing capability.
 14. The encoding apparatus of claim 8, wherein the processing circuitry is configured to receive, via the input/output circuitry, a report from a decoder that indicates decoding capabilities of the decoder and wherein the processing circuitry is configured to determine the known or assumed per-core processing capability from the report. 